Overview

Dataset statistics

Number of variables12
Number of observations6497
Missing cells0
Missing cells (%)0.0%
Duplicate rows1179
Duplicate rows (%)18.1%
Total size in memory609.2 KiB
Average record size in memory96.0 B

Variable types

NUM12

Reproduction

Analysis started2020-07-17 21:03:27.048316
Analysis finished2020-07-17 21:04:08.526118
Duration41.48 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 1179 (18.1%) duplicate rows Duplicates
citric acid has 151 (2.3%) zeros Zeros

Variables

fixed acidity
Real number (ℝ≥0)

Distinct count106
Unique (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.215307064799139
Minimum3.8
Maximum15.9
Zeros0
Zeros (%)0.0%
Memory size50.8 KiB

Quantile statistics

Minimum3.8
5-th percentile5.7
Q16.4
median7
Q37.7
95-th percentile9.8
Maximum15.9
Range12.1
Interquartile range (IQR)1.3

Descriptive statistics

Standard deviation1.296433758
Coefficient of variation (CV)0.1796782516
Kurtosis5.061160665
Mean7.215307065
Median Absolute Deviation (MAD)0.6
Skewness1.723289647
Sum46877.85
Variance1.680740488
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6.83545.4%
 
6.63275.0%
 
6.43054.7%
 
72824.3%
 
6.92794.3%
 
7.22734.2%
 
6.72644.1%
 
7.12574.0%
 
6.52423.7%
 
7.42383.7%
 
Other values (96)367656.6%
 
ValueCountFrequency (%) 
3.81< 0.1%
 
3.91< 0.1%
 
4.22< 0.1%
 
4.43< 0.1%
 
4.51< 0.1%
 
ValueCountFrequency (%) 
15.91< 0.1%
 
15.62< 0.1%
 
15.52< 0.1%
 
152< 0.1%
 
14.31< 0.1%
 

volatile acidity
Real number (ℝ≥0)

Distinct count187
Unique (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3396659996921656
Minimum0.08
Maximum1.58
Zeros0
Zeros (%)0.0%
Memory size50.8 KiB

Quantile statistics

Minimum0.08
5-th percentile0.16
Q10.23
median0.29
Q30.4
95-th percentile0.67
Maximum1.58
Range1.5
Interquartile range (IQR)0.17

Descriptive statistics

Standard deviation0.1646364741
Coefficient of variation (CV)0.4847010717
Kurtosis2.825372417
Mean0.3396659997
Median Absolute Deviation (MAD)0.08
Skewness1.495096542
Sum2206.81
Variance0.0271051686
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.282864.4%
 
0.242664.1%
 
0.262563.9%
 
0.252383.7%
 
0.222353.6%
 
0.272323.6%
 
0.232213.4%
 
0.22173.3%
 
0.32143.3%
 
0.322053.2%
 
Other values (177)412763.5%
 
ValueCountFrequency (%) 
0.0840.1%
 
0.0851< 0.1%
 
0.091< 0.1%
 
0.160.1%
 
0.10560.1%
 
ValueCountFrequency (%) 
1.581< 0.1%
 
1.332< 0.1%
 
1.241< 0.1%
 
1.1851< 0.1%
 
1.181< 0.1%
 

citric acid
Real number (ℝ≥0)

ZEROS

Distinct count89
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3186332153301524
Minimum0.0
Maximum1.66
Zeros151
Zeros (%)2.3%
Memory size50.8 KiB

Quantile statistics

Minimum0
5-th percentile0.05
Q10.25
median0.31
Q30.39
95-th percentile0.56
Maximum1.66
Range1.66
Interquartile range (IQR)0.14

Descriptive statistics

Standard deviation0.1453178649
Coefficient of variation (CV)0.4560662791
Kurtosis2.397239216
Mean0.3186332153
Median Absolute Deviation (MAD)0.07
Skewness0.4717306725
Sum2070.16
Variance0.02111728186
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.33375.2%
 
0.283014.6%
 
0.322894.4%
 
0.492834.4%
 
0.262574.0%
 
0.342493.8%
 
0.292443.8%
 
0.272363.6%
 
0.242323.6%
 
0.312303.5%
 
Other values (79)383959.1%
 
ValueCountFrequency (%) 
01512.3%
 
0.01400.6%
 
0.02560.9%
 
0.03320.5%
 
0.04410.6%
 
ValueCountFrequency (%) 
1.661< 0.1%
 
1.231< 0.1%
 
160.1%
 
0.991< 0.1%
 
0.912< 0.1%
 

residual sugar
Real number (ℝ≥0)

Distinct count316
Unique (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.443235339387409
Minimum0.6
Maximum65.8
Zeros0
Zeros (%)0.0%
Memory size50.8 KiB

Quantile statistics

Minimum0.6
5-th percentile1.2
Q11.8
median3
Q38.1
95-th percentile15
Maximum65.8
Range65.2
Interquartile range (IQR)6.3

Descriptive statistics

Standard deviation4.757803743
Coefficient of variation (CV)0.8740764355
Kurtosis4.359271948
Mean5.443235339
Median Absolute Deviation (MAD)1.7
Skewness1.435404263
Sum35364.7
Variance22.63669646
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
22353.6%
 
1.82283.5%
 
1.62233.4%
 
1.42193.4%
 
1.21953.0%
 
2.21872.9%
 
2.11792.8%
 
1.91762.7%
 
1.71752.7%
 
1.51722.6%
 
Other values (306)450869.4%
 
ValueCountFrequency (%) 
0.62< 0.1%
 
0.770.1%
 
0.8250.4%
 
0.9410.6%
 
0.9540.1%
 
ValueCountFrequency (%) 
65.81< 0.1%
 
31.62< 0.1%
 
26.052< 0.1%
 
23.51< 0.1%
 
22.61< 0.1%
 

chlorides
Real number (ℝ≥0)

Distinct count214
Unique (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.05603386178236109
Minimum0.009
Maximum0.611
Zeros0
Zeros (%)0.0%
Memory size50.8 KiB

Quantile statistics

Minimum0.009
5-th percentile0.028
Q10.038
median0.047
Q30.065
95-th percentile0.102
Maximum0.611
Range0.602
Interquartile range (IQR)0.027

Descriptive statistics

Standard deviation0.03503360137
Coefficient of variation (CV)0.6252219686
Kurtosis50.89805146
Mean0.05603386178
Median Absolute Deviation (MAD)0.011
Skewness5.399827732
Sum364.052
Variance0.001227353225
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.0442063.2%
 
0.0362003.1%
 
0.0421872.9%
 
0.0461852.8%
 
0.0481822.8%
 
0.041822.8%
 
0.051822.8%
 
0.0471752.7%
 
0.0451742.7%
 
0.0381692.6%
 
Other values (204)465571.6%
 
ValueCountFrequency (%) 
0.0091< 0.1%
 
0.0123< 0.1%
 
0.0131< 0.1%
 
0.01440.1%
 
0.01540.1%
 
ValueCountFrequency (%) 
0.6111< 0.1%
 
0.611< 0.1%
 
0.4671< 0.1%
 
0.4641< 0.1%
 
0.4221< 0.1%
 

free sulfur dioxide
Real number (ℝ≥0)

Distinct count135
Unique (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30.525319378174544
Minimum1.0
Maximum289.0
Zeros0
Zeros (%)0.0%
Memory size50.8 KiB

Quantile statistics

Minimum1
5-th percentile6
Q117
median29
Q341
95-th percentile61
Maximum289
Range288
Interquartile range (IQR)24

Descriptive statistics

Standard deviation17.74939977
Coefficient of variation (CV)0.5814648342
Kurtosis7.906238067
Mean30.52531938
Median Absolute Deviation (MAD)12
Skewness1.220066074
Sum198323
Variance315.0411923
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
291832.8%
 
61702.6%
 
261612.5%
 
151572.4%
 
311522.3%
 
241522.3%
 
171492.3%
 
341462.2%
 
351442.2%
 
231422.2%
 
Other values (125)494176.1%
 
ValueCountFrequency (%) 
13< 0.1%
 
22< 0.1%
 
3590.9%
 
4520.8%
 
51292.0%
 
ValueCountFrequency (%) 
2891< 0.1%
 
146.51< 0.1%
 
138.51< 0.1%
 
1311< 0.1%
 
1281< 0.1%
 

total sulfur dioxide
Real number (ℝ≥0)

Distinct count276
Unique (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean115.7445744189626
Minimum6.0
Maximum440.0
Zeros0
Zeros (%)0.0%
Memory size50.8 KiB

Quantile statistics

Minimum6
5-th percentile19
Q177
median118
Q3156
95-th percentile206
Maximum440
Range434
Interquartile range (IQR)79

Descriptive statistics

Standard deviation56.52185452
Coefficient of variation (CV)0.488332648
Kurtosis-0.3716636549
Mean115.7445744
Median Absolute Deviation (MAD)39
Skewness-0.001177478234
Sum751992.5
Variance3194.720039
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
111721.1%
 
113651.0%
 
122570.9%
 
117570.9%
 
98560.9%
 
114560.9%
 
124560.9%
 
128560.9%
 
118550.8%
 
150540.8%
 
Other values (266)591391.0%
 
ValueCountFrequency (%) 
63< 0.1%
 
740.1%
 
8140.2%
 
9150.2%
 
10280.4%
 
ValueCountFrequency (%) 
4401< 0.1%
 
366.51< 0.1%
 
3441< 0.1%
 
3131< 0.1%
 
307.51< 0.1%
 

density
Real number (ℝ≥0)

Distinct count20
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9947004771432968
Minimum0.987
Maximum1.039
Zeros0
Zeros (%)0.0%
Memory size50.8 KiB

Quantile statistics

Minimum0.987
5-th percentile0.99
Q10.992
median0.995
Q30.997
95-th percentile0.999
Maximum1.039
Range0.052
Interquartile range (IQR)0.005

Descriptive statistics

Standard deviation0.003009558063
Coefficient of variation (CV)0.003025592258
Kurtosis6.502100343
Mean0.9947004771
Median Absolute Deviation (MAD)0.002
Skewness0.4900584549
Sum6462.569
Variance9.057439735e-06
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.99681112.5%
 
0.99772011.1%
 
0.99569810.7%
 
0.99265610.1%
 
0.9946419.9%
 
0.9986409.9%
 
0.9936319.7%
 
0.9915218.0%
 
0.993755.8%
 
0.9993385.2%
 
Other values (10)4667.2%
 
ValueCountFrequency (%) 
0.98780.1%
 
0.988170.3%
 
0.9891662.6%
 
0.993755.8%
 
0.9915218.0%
 
ValueCountFrequency (%) 
1.0391< 0.1%
 
1.012< 0.1%
 
1.0042< 0.1%
 
1.00390.1%
 
1.002150.2%
 

pH
Real number (ℝ≥0)

Distinct count108
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.2185008465445586
Minimum2.72
Maximum4.01
Zeros0
Zeros (%)0.0%
Memory size50.8 KiB

Quantile statistics

Minimum2.72
5-th percentile2.97
Q13.11
median3.21
Q33.32
95-th percentile3.5
Maximum4.01
Range1.29
Interquartile range (IQR)0.21

Descriptive statistics

Standard deviation0.1607872021
Coefficient of variation (CV)0.04995717254
Kurtosis0.3676572674
Mean3.218500847
Median Absolute Deviation (MAD)0.11
Skewness0.3868387981
Sum20910.6
Variance0.02585252436
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.162003.1%
 
3.141933.0%
 
3.221852.8%
 
3.21762.7%
 
3.151702.6%
 
3.191702.6%
 
3.181682.6%
 
3.241612.5%
 
3.121542.4%
 
3.11542.4%
 
Other values (98)476673.4%
 
ValueCountFrequency (%) 
2.721< 0.1%
 
2.742< 0.1%
 
2.771< 0.1%
 
2.793< 0.1%
 
2.83< 0.1%
 
ValueCountFrequency (%) 
4.012< 0.1%
 
3.92< 0.1%
 
3.851< 0.1%
 
3.821< 0.1%
 
3.811< 0.1%
 

sulphates
Real number (ℝ≥0)

Distinct count111
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5312682776666154
Minimum0.22
Maximum2.0
Zeros0
Zeros (%)0.0%
Memory size50.8 KiB

Quantile statistics

Minimum0.22
5-th percentile0.35
Q10.43
median0.51
Q30.6
95-th percentile0.79
Maximum2
Range1.78
Interquartile range (IQR)0.17

Descriptive statistics

Standard deviation0.1488058736
Coefficient of variation (CV)0.2800955372
Kurtosis8.653698823
Mean0.5312682777
Median Absolute Deviation (MAD)0.08
Skewness1.797270004
Sum3451.65
Variance0.02214318802
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.52764.2%
 
0.462433.7%
 
0.542353.6%
 
0.442323.6%
 
0.382143.3%
 
0.482083.2%
 
0.522033.1%
 
0.491973.0%
 
0.471912.9%
 
0.451902.9%
 
Other values (101)430866.3%
 
ValueCountFrequency (%) 
0.221< 0.1%
 
0.231< 0.1%
 
0.2540.1%
 
0.2640.1%
 
0.27130.2%
 
ValueCountFrequency (%) 
21< 0.1%
 
1.981< 0.1%
 
1.952< 0.1%
 
1.621< 0.1%
 
1.611< 0.1%
 

alcohol
Real number (ℝ≥0)

Distinct count111
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.49180021548407
Minimum8.0
Maximum14.9
Zeros0
Zeros (%)0.0%
Memory size50.8 KiB

Quantile statistics

Minimum8
5-th percentile9
Q19.5
median10.3
Q311.3
95-th percentile12.7
Maximum14.9
Range6.9
Interquartile range (IQR)1.8

Descriptive statistics

Standard deviation1.192711764
Coefficient of variation (CV)0.1136803732
Kurtosis-0.5316882252
Mean10.49180022
Median Absolute Deviation (MAD)0.9
Skewness0.5657174312
Sum68165.226
Variance1.422561352
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
9.53675.6%
 
9.43325.1%
 
9.22714.2%
 
102293.5%
 
10.52273.5%
 
112173.3%
 
92153.3%
 
9.82143.3%
 
10.41943.0%
 
9.31933.0%
 
Other values (101)403862.2%
 
ValueCountFrequency (%) 
82< 0.1%
 
8.450.1%
 
8.5100.2%
 
8.6230.4%
 
8.7801.2%
 
ValueCountFrequency (%) 
14.91< 0.1%
 
14.21< 0.1%
 
14.051< 0.1%
 
14120.2%
 
13.93< 0.1%
 

quality
Real number (ℝ≥0)

Distinct count7
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.818377712790519
Minimum3
Maximum9
Zeros0
Zeros (%)0.0%
Memory size50.8 KiB

Quantile statistics

Minimum3
5-th percentile5
Q15
median6
Q36
95-th percentile7
Maximum9
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8732552715
Coefficient of variation (CV)0.1500856965
Kurtosis0.2323222693
Mean5.818377713
Median Absolute Deviation (MAD)1
Skewness0.1896226934
Sum37802
Variance0.7625747693
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6283643.7%
 
5213832.9%
 
7107916.6%
 
42163.3%
 
81933.0%
 
3300.5%
 
950.1%
 
ValueCountFrequency (%) 
3300.5%
 
42163.3%
 
5213832.9%
 
6283643.7%
 
7107916.6%
 
ValueCountFrequency (%) 
950.1%
 
81933.0%
 
7107916.6%
 
6283643.7%
 
5213832.9%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
07.40.700.001.90.07611.034.00.9983.510.569.45
17.80.880.002.60.09825.067.00.9973.200.689.85
27.80.760.042.30.09215.054.00.9973.260.659.85
311.20.280.561.90.07517.060.00.9983.160.589.86
47.40.700.001.90.07611.034.00.9983.510.569.45
57.40.660.001.80.07513.040.00.9983.510.569.45
67.90.600.061.60.06915.059.00.9963.300.469.45
77.30.650.001.20.06515.021.00.9953.390.4710.07
87.80.580.022.00.0739.018.00.9973.360.579.57
97.50.500.366.10.07117.0102.00.9983.350.8010.55

Last rows

fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
64876.80.2200.361.200.05238.0127.00.9933.040.549.25
64884.90.2350.2711.750.03034.0118.00.9953.070.509.46
64896.10.3400.292.200.03625.0100.00.9893.060.4411.86
64905.70.2100.320.900.03838.0121.00.9913.240.4610.66
64916.50.2300.381.300.03229.0112.00.9933.290.549.75
64926.20.2100.291.600.03924.092.00.9913.270.5011.26
64936.60.3200.368.000.04757.0168.00.9953.150.469.65
64946.50.2400.191.200.04130.0111.00.9932.990.469.46
64955.50.2900.301.100.02220.0110.00.9893.340.3812.87
64966.00.2100.380.800.02022.098.00.9893.260.3211.86

Duplicate rows

Most frequent

fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholqualitycount
4617.00.150.2814.70.05129.0149.00.9982.960.399.078
6237.30.190.2713.90.05745.0155.00.9982.940.418.888
3616.80.180.3012.80.06219.0171.00.9983.000.529.077
6627.40.160.3013.70.05633.0168.00.9982.900.448.777
6617.40.160.2715.50.05025.0135.00.9982.900.438.776
6657.40.190.3012.80.05348.5229.00.9993.140.499.176
6667.40.190.3114.50.04539.0193.00.9993.100.509.266
7297.60.200.3014.20.05653.0212.50.9993.140.468.986
325.70.220.2016.00.04441.0113.00.9993.220.468.965
1196.20.230.3617.20.03937.0130.00.9993.230.438.865